Internet Archeology: Estimating Individual Application Trends in Incomplete Historic Traffic Traces
نویسندگان
چکیده
Public traffic traces are often obfuscated for privacy reasons, leaving network historians with only port numbers from which to identify past application traffic trends. However, it is misleading to make assumptions simply based on default port numbers for many applications. Traffic classification based on machine learning could provide a solution. By training a classifier using representative traffic samples, we can differentiate between distinct, but possibly similar, applications in previously anonymised trace files. Using popular peer-to-peer and online game applications as examples, we show that their traffic flows can be separated after-the-fact without using port numbers or packet payload. We also address how to obtain negative training examples, propose an approach that works with any existing machine-learning algorithm, and present a preliminary evaluation based on real traffic data.
منابع مشابه
H-Probe: Estimating Traffic Correlations from Sampling and Active Network Probing
An extensive body of research deals with estimating the correlation and the Hurst parameter of Internet traffic traces. The significance of these statistics is due to their fundamental impact on network performance. The coverage of Internet traffic traces is, however, limited since acquiring such traces is challenging with respect to, e.g., confidentiality, logging speed, and storage capacity. ...
متن کاملRobust estimation of the self-similarity parameter in network traffic using wavelet transform
This article studies the problem of estimating the self-similarity parameter of network traffic traces. A robust wavelet-based procedure is proposed for this estimation task of deriving estimates that are less sensitive to some commonly encountered non-stationary traffic conditions, such as sudden level shifts and breaks. Two main ingredients of the proposed procedure are: (i) the application o...
متن کاملRobust estimation of self-similarity parameter in network traffic using wavelet transform
This article studies the problem of estimating the self-similarity parameter of network traffic traces. In order to guard against possible departures from standard modelling assumptions, a robust wavelet-based procedure is proposed for this estimation task. Two main ingredients of the proposed procedure are: (i) the application of a robust regression technique for estimating the parameter from ...
متن کاملInternet Usage at Elementary, Middle and High Schools: A First Look at K-12 Traffic from Two US Georgia Counties
Earlier Internet traffic analysis studies have focused on enterprises [1, 6], backbone networks [2, 3], universities [5, 7], or residential traffic [4]. However, much less is known about Internet usage in the K-12 educational system (elementary, middle and high schools). In this paper, we present a first analysis of network traffic captured at two K-12 districts in the US state of Georgia, also...
متن کاملDistilling the Internet's Application Mix from Packet-Sampled Traffic
As the Internet continues to grow both in size and in terms of the volume of traffic it carries, more and more networks in the different parts of the world are relying on an increasing number of distinct ways to exchange traffic with one another. As a result, simple questions such as “What is the application mix in today’s Internet?” may produce non-informative simple answers unless they are re...
متن کامل